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Abstract. We introduce estimation and test procedures through divergence optimiza- 
tion for discrete or continuous parametric models. This approach is based on a new 
dual representation for divergences. We treat point estimation and tests for simple and 
composite hypotheses, extending maximum likelihood technique. An other view at the 
maximum likelihood approach, for estimation and test, is given. We prove existence and 
consistency of the proposed estimates. The limit laws of the estimates and test statistics 
(including the generalized likelihood ratio one) are given both under the null and the 
alternative hypotheses, and approximation of the power functions is deduced. A new 
procedure of construction of confidence regions, when the parameter may be a boundary 
value of the parameter space, is proposed. Also, a solution to the irregularity problem of 
the generalized likelihood ratio test pertaining to the number of components in a mixture 
is given, and a new test is proposed, based on x^-divergence on signed finite measures 
and duality technique. 
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1. Introduction and notation 



Let {'V,13) be a measurable space and P be a given probability measure (p.m.) on {X,B). 
Denote A4 the real vector space of all signed finite measures on {X,B) and M{P) the 
vector subspace of all signed finite measures absolutely continuous (a.c.) with respect to 



(w.r.t.) P. Denote also the set of al 



p.m.'s on {X,B) and M^{P) the subset of all 



p.m. s a.c. w.r.t. P. Let (j) he a propei3^closec|§ convex function from ] — oo,-|-oo[ to 
[0, +oo] with <j){l) = and such that its domain dom0 := {x G M such that (j){x) < oo} is 
an interval with endpoints < 1 < 6^ (which may be finite or infinite). For any signed 



Date: December 22, 2007. 

^We say a function is proper if its domain is non void. 

*^The closedness of (j) means that if or are finite numbers then (f){x) tends to 4>{a^) or (j>{b^) when 
I OT X 1 b^, respectively. 
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finite measure Q in A4(P), tlie (/>-divergence between Q and P is defined by 

D^{Q,P):= Ij(^^{x)^ dP{x). 



(1.1) 



W hen Q is not a .c. w.r.t. P, we set D^{Q,P) = +co. The (^^-divergences were introduced 
by Icsiszar (1963) as "/-divergences". For all p.m. P, the mappings Q & M. ^ D^{Q,P) 
are convex and take nonnegative values. When Q = P then D^[Q,P) = 0. Furthermore, 
if the function x i— > (l){x) is strictly convex on a neighborhood of a; = 1, then the following 
fundamental property holds 



DAQ^ P) = if and only if Q = P. 



(1.2) 



All these properties are presented in ICsiszail (jl963l . Il967al lbl) and iLiese and Vajdal (|l987l ) 
chapter 1, for (/)-divergences defined on the set of all p.m.'s Ai^. When the (/)-divergences 
are defined on M, then the same properties hold. Let us conclude these few remarks 
quoting that in general D^{Q,P) and D^{P,Q) are not equal. Hence, (/>-divergences 
usually are not distances, but they merely measure some difference between two measures. 
Of course a main feature of divergences between distributions of random variables X and 
Y is the invariance property with respect to common smooth change of variables. 



1.1. Examples of (/)-divergences. When defined on M.^ , the Kullback-Leibler (KL), 
modified Kullback-Leibler (KLm), modified (Xm)? Hellinger (H), and Li diver- 



X + 1, 0(x) 
^ and 0(x) 



gences are respectively associated to the convex functions (j){x) = x log x — 
— logx + x — 1, (t){x) = i(x — 1)^, (t){x) = \{x — l)^/x, (t){x) = 2{y/x — 1] 
|x — 1|. All these divergen ces except the Li one, be long to t he class of the so called "power 
divergences" introduced in lCressie and ReadI (jl984l ) (see also lLiese and Vajdal (|l987l ) chap- 
ter 2). They are defined through the class of convex functions 



X g]0. 



-oo 



— 7X + 7 — 1 
7(7 - 1) 



(1.3) 



if 7 G E \ {0, 1}, 4>q{x) := — logx + x — 1 and 4>i{x) := xlogx — x + 1. (For all 7 G M, we 
define ^7(0) := lim^^io </'7(2;))- So, the i^L-divergence is associated to cpi, the KLm to (j)o, 
the to (j)2, the Xm to (p-i and the Hellinger distance to 0i/2- 

We extend the definition of the power divergences functions Q £ M.^ ^ D^^{Q,P) onto 
the whole vector space of all signed finite measures M via the extension of the definition 
of the convex functions (p-y : For all 7 G M such that the function x 1-^ (p-yix) is not defined 
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on ] — (X), 0[ or defined but not convex on whole M, set 

X e] — oo, +oo[h^ 



'^(x) if X G [0, +00 [, 
-00 if X g] — 00, 0[. 



Note that for the x^-divergence, the corresponding cp function (j)2{x) 
and convex on whole M. 



(1.4) 



^(x — 1)^ is defined 



In this paper, we are interested in estimation and test using (/(-divergences. An i.i.d. sample 
Xi, . . . , Xn with common unknown distribution P is observed and some p.m. Q is given. 
We aim to estimate D^[Q,P) and, more generally, infggn ^'^(Q, P) where is some set 
of measures, as well as the measure Q* achieving the infimum on J7. In the parametric 
context, these problems can be well defined and lead to new results in estimation and 
tests, extending classical notions. 

1.2. Statistical examples and motivations. 

1.2.1. Tests of fit. Let Qq and P be two p.m.'s with same support S. Introduce a finite 
partition ^41, . . . , of 5 (when S is finite this partition is the support of Qo)- The quan- 
tization method consists in approximating D^{Qq,P) by l^j=i (^T^xj) -^(^i) which is 
estimated by 



DJQo,P) 



k 



where P„ is the empirical m easure associated to t he data. In this vein, goodness of fit 
tests have been propo sed by Zografos et al. il990 ') for fixed number of classes, and by 



Menendez et al 



Iwm and lovdrfi and Vaidal (12002) when the number of classes depends 
on the sample size. We refer to lpardol kood ) which treats these problems extensively and 



contains many more references. 

1.2.2. Parametric estimation and tests. Let {Pe'jO G G} be some parametric model with 
G a set in W^. On the basis of an i.i.d. sample Xi, . . . , X„ with distribution Pg^,, we want 
to estimate 6t, the unknown true value of the parameter and perform statistical tests 
on the parameter u s ing (^- d ivergences. Wh en al l p.m.'s Pp sha r e the same finite support 
S, iLiese and Vajdal (jl987l ). iLindsayl (119941 ) and Morales et al\ (|l995l ) introduced the so- 
called "Miniin um (/>-divergences estimates" (M(/>DE's) (Minimum Disparity Estimators in 
Lindsay! (j 19941 )) of the parameter 6t, defined by 



axgmfD^{Pe,Pn). 

pG0 



(1.5) 
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Var ious parametric tests can be perforined bas ed on the previous estimates of ^-divergences; 



see 



Lindsayl (119941 ) and 



Morales et al. 



(jl995l ). The class of estimates (ll.5p contains the 
maximum likehhood estimate (MLE). Indeed, when (j){x) = 4>o{x) = — logx + x — 1, we 
obtain 



0KL^ := arg inf KLM,Pn) = arg inf V - \og{Pe{j))Pn{j) = MLE. (1.6) 



See 



The Mode's (II. 5p are motivated by the fact that a suit able choice of t h e divergence may 



lead to an e stimate more robus t than the ML one (see e.g. iLindsayl (119941 ) . iBasu and Lindsay 
il994 ) and I Jimenez and Shaol (|200l|)). 



When interested in testing hypotheses 7io : 9t = Oq against alternatives Tii : 9t ^ Oq, 
where is a given value, we can use the statistic D^{Pqq,Pj^), the plug-in estimate of the 
(j^-divergence between P ^^ and Pg^,, rejecting TIq for large values of the statistic; see e.g. 
Cressie and ReadI (| 19841 ). In the case when (j){x) = — logx -|- x — 1, the corresponding test 
based on KLmiPeg, Pn) does not coincide with the generalized likelihood ratio one, which 
defined through the generalized likehhood ratio (GLR) A„ := 2 log . The 

new estimate K LmiPoQ , Pst) K Lm{Peo , Pot) ^ which is proposed in this paper, leads to 
the generalized likelihood ratio test; see remark [3.71 below. 



When the support 5* is continuous, the plug-in estimates (jl.5p are not well defined; 
Basu and Lindsayl (119941 ) investigate the so-called "minimum disparity estimators" (MDE's) 
for continuous models, through some common kernel smoothing method of Pn and Pg. 
When 4'{x) = — logx -|- a; — 1, this estimate clearly, due to smoothing, does not coin- 
cide generally with the ML one. Also, the test based on the associated estimator of the 
KLmiPeo^Per) different from the generalized likelihood ratio one. Further, their esti- 



ma tes pos e s the p roblem of the choice of the kernel and the window. For H e. 



see 



BeranI ()1977l ). For nonparametric goodness-of-fit test. 



Berlinet et al. 



linge r distance. 



(11993) 



Berlinet 



(jl999l ) proposed a test based on the estimation of the i^Lm-divergence using the smoothed 
kernel estimate of the density. The e x tension of the i r resu lts to other divergeri c es rem ains 
an open problem; see 



Berlinet 



Gvorfi et aU iwm . and 



Berlinet et al 



(11993). All 



those tests are stated for simple null hypotheses; the case of composite null hypotheses 
seems difficult to handle by the above technique. In the present paper, we treat this prob- 
lem in the parametric setting. 



PARAMETRIC ESTIMATION AND TESTS THROUGH DIVERGENCES AND DUALITY TECHNIQUE 5 



When the support S is discrete infinite or continuous, then the plug-in estimate D(j){Pg, P^) 
us uahy takes infinite v alue when no use is done of some partition-based approximation. 



In 



Broniatowski 



(|2003l ). a new estimation procedure is proposed in order to estimate the 
iTL-divergence between some set of p.m.'s and some p.m. P, without making use of any 
partitioning nor smoothing, but merely making use of the well known "dual" representa- 
tion of the -fCL-divergence as the Fenchel-Legendre tra nsform of the moment generating 



function. Extending the paper by iBroniatowskil (120031'). we wil l use the new dua! 
sentat ions of ^-divergences (see Broniatowski and Keziou ( 20061 ) theorem 4.4 and 



repre- 



Keziou 



(|2003l ) theorem 2.1) to define the minimum i?i>-divergence estimates in both discrete and 
continuous parametric models. These representations are the starting point for the defi- 
nition of estimates of the parameter 6t, which we will call "minimum dual (/>-divergence 
estimates" (MDi;^DE's). They are defined in parametric models {Pe;6 G G}, where the 
p.m.'s Pe do not necessarily have finite support; it can be discrete or continuous, bounded 
or not. Also the same representations will be applied in order to estimate D^{Pqq^Pq^) 
and infgge,) D^{Pq,Pq^) where is a given value in and 0o is a given subset of 0, which 
leads to various simple and composite tests pertaining to Ot, the true unknown value of the 
parameter. When 0(x) = — log — 1, the MDcpY) estimate coincides with the maximum 
likelihood one (see remark [3T2] below) : since our approach includes also test procedures, it 
will be seen that with this peculiar choice for the function 0, we recover the classical like- 
lihood ratio test for simple hypotheses and for composite hypoth eses (see remark [3.71 an d 
remark [3.101 below). A similar approach has been proposed by iLiese and Vaidal (j2006l ) : 
see their formula (118). 



In any case, an exhaustive study of M(/)DE's seems necessary, in a way that would include 
both the discrete and the continuous support cases. This is precisely the main scope of 
this paper. 



The remainder of this paper is organized as follows. In section 2, we re call the dual 
representations of (^-diver g ences obtained bv lBroniatowski and Keziou! (|2006l ) theorem 4.4, 
Broniatowski and Keziou! (|2004! ) theorem 2.4 and !Keziou! (|2003! ) theorem 2.1. Section 3 
presents, through the dual representation of (/>-divergences, various estimates and tests in 
the parametric framework and deals with their asymptotic properties both under the null 
and the alternative hypotheses. The existence and consistency of th e prop osed estimates 
are proved using similar arguments as developed in !Qin and Lawless! (!l994l') lemma 1. We 



use the limit laws of the proposed test statistics, in a similar way to 



Morales and Pardo 
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(|200ll ). to give an approximation to the power functions of the tests (including the GLR 
one). Observe that the power functions of the likelihood ratio type tests are not generally 
known; one of our contributions is to provide explicit power functions in the general case 
for simple or composite hypotheses. As a by-product, we obtain the minimal sample 
size which ensures a given power, for quite general simple or composite hypotheses. In 
section 4, we give a solution to the irregularity problem of the GLR test of the number of 
components in a mixture; we propose a new test based on the x^-divergence on signed finite 
measures, and a new procedure of construction of confidence regions for the parameter in 
the case where 6t may be a boundary value of the parameter space Q. All proofs are in 
the Appendix. We sometimes write Pf for J f dP for any measure P and any function 
/, when defined. 

2. Fenchel Duality for (/)-divergences 
In this section, we recall a version of the dual representations of ^-divergences obtained 



m 



Broniatowski and Kezioul (|2006l ). using Fenchel duality technique. First, we give some 



notations and some resul ts about the conju gate (or Fenchel-Legendre transform) of real 



convex functions; see e.g. iRockafellarl (jl970|) for proofs. The Fenchel-Legendre transform 



of cj) will be denoted (/>*, i.e., 

t£R^(j)*{t):=sup{tx-(l){x)}, (2.1) 

and the endpoints of domcj)* (the domain of (j)*) will be denoted a^f,* and b^* with a^* < b^* . 
Note that (p* is a proper closed convex function. In particular, a^* < < b^*, (/>*(0) = 
and 

a,. = lim 6,. = lim (2.2) 

y^-oo y y^+oo y 

By the closedness of (j), applying the duality principle, the conjugate (j)** of (p* coincides 
with (/>, i.e., 

0**(t) :=sup{tx-0*(x)} = </>(t), foralHEM. (2.3) 

For the proper convex functions defined on M (endowed with the usual topology), the lower 
semi-continuitj|§ and the closedness properties are equivalent. The function <f) (resp. (f)*) 
is differentiable if it is differentiable on ja,^, (resp. Ja^*, b^f,* [), the interior of its domain. 
Also (f) (resp. (/>*) is strictly convex if it is strictly convex on ]afj),b^[ (resp. ]a^*,b^*\). 



■^We say a function (p is lower semi-continuous if the level sets {a; £ R such that 4>{^) < a}, Q £ R are 
all closed. 
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The strict convexity of (f> is equivalent to the condition that its conjugate (j)* is essentially 
smooth, i.e., differentiable with 



limtia^* (t>*'{t) = -oo if a^* > -oo, 
limjife (j)*'{t) = +00 if b^* < +00. 



(2.4) 

Conve rsely, <j) is essentially smooth if and only if cp* is strictly convex; see e.g. 



Rockafellar 



(|l97Cl ) section 26 for the proofs of these properties. If (j) is differentiable, we denote (p' 
the derivative function of (j), and we define (l)'{a^) and (j)'{b(j,) to be the limits (which may 
be finite or infinite) lima;ja_^ (p'ix) and lim^^i^^^ <t>'{x), respectively. We denote Im^' the set 
of all values of the function 0', i.e., Im0' := {(t)'{x) such that x E [a^, h^]}. If additionally 
the function (f) is strictly convex, then (f)' is increasing on [a^, 6^]. Hence, it is a one-to-one 
function from [a,^, 6^] to Imcf)' . In this case, 0'"^ denotes the inverse function of from 
lm4>' to [a^, 6^]. If (j) is differentiable, then for all x Gja^, 

0* {<j)'{x)) = x(t)'{x) -(p{x). (2.5) 

If additionally (p is strictly convex, then for all t G luKp' we have 

cj)*{t) = t(b'^\t) - 4) (0'"\t)) and 4>*'{t) = (l)''\t). (2.6) 

On the other hand, if (p is essentially smooth, then the interior of the domain of (p* coin- 
cides with that of Im(^', i.e., {a^*,h(f)*) = {(p'{a^),(p'{h(f))). 

Let be some class of ;S-measurable real valued functions / defined on X, and denote 
TWjF, the real vector subspace of Ai, defined by 



/ I/I d\Q\ 



Mr ■.= {QgM such that / |/| d\Q\ < oo, for all / G 



In the follo wing theorem, we recall a version of the dual repres entations of (/)-divergences 



obtain ed bv iBroniatowski and Kezioul (|2006l ) (for the proof, see 



Broniatowski and Keziou 



(120061 ') theorem 4.4). 



Theorem 2.1. Assume that (p is differentiable. Then, for all Q G Mr such that D^{Q, P) 
is finite and 0' ( ^ ) belongs to T , the ^-divergence D^{Q, P) admits the dual representa- 



\dPj 

tion 



D^iQ, P) = sup y fdQ- j<p* if) dpj , (2.7) 



MICHEL BRONIATOWSKr AND AMOR KEZIOU** 



and the function f := (j)' is a dual optimal solutioi^. Furthermore, if (j) is essentially 

smooth, then f := cp' (dQ/dP) is the unique dual optimal solution (P-a.e.). 



3. Parametric estimation and tests through minimum (/)-divergence 
approach and duality technique 

We consider an identifiable parametric model {Pe'-,0 £ G} defined on some measurable 
space {X, B) and is some set in not necessarily an open set. For simplicity, we 
write D^{9,q) instead of D^^Pq, Pa). We assume that for any 9 in Q, Pg has density pg 
with respect to some dominating cr-finite measure A, which can be either with countable 
support or not. Assume further that the support S of the p.m. Pg does not depend upon 
6. On the basis of an i.i.d. sample Xi, ...,Xn with distribution Pgj,, we intend to estimate 
6t, the true unknown value of the parameter, which is assumed to be an interior point 
of the parameter space 0. We will consider only strictly convex functions (p which are 
essentially smooth. We will use the following assumption 

^^^^^ ' dPg{x) < oo. (3.1) 



Note that if the function 6 satisfies 



there exists < 6 < 1 such that for all c in [1 — 5, 1 + (5], 

we can find numbers ci, C2, C3 such that (3-2) 

(j){cx) < ci(j){x) + C2 |x| + C3, for all real x, 



Broniatowski and Keziou 



then t he assumption (|3.ip is satisfied whenever D^{9., a) < 00; see e.g 
(j2006l ) lemma 3.2. Also the real convex functions (p^ (|1.4|) . associated to the class of power 



divergences, all satisfy the condition (j3.2p . including all standard divergences. 

For a given 9 £ Q, consider the class of functions 

= J^g := l^x ^ ct>' (^^y, aeej. (3.3) 

By application of Theorem 12.11 above, when assumption (j3.ip holds for any a € Q, we 
obtain 

D49,9t) = srip fdPg- J (P*{f) dPg^^ , 



i.e., the supremum in (|2.7|l is achieved at / := (f>' [dQ/dP) . 

'Note that this is equivalent to the condition that its conjugate cf>* is strictly convex. 
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which, by ()2.5p . can be written as 

D^{e,dT) = supj / 0' dPe 



Pa 



Pa 



Pe_ 

Pa 



dPe, 



(3.4) 



Furthermore, the supremum in this display is unique and it is achieved aX a = 9t 
independently upon the value of 9. Hence, it is reasonable to estimate D^{6,6t) := 
/ 4>{Pe/peT) dPoj., the (/)-divergence between Pq and Pq^, by 



D^{9,9t) := sup 
aee 



^1 dP, 

Po 



— <t> 
Pa 



Pe_ 

Pa 



Pe_ 

Pa 



dPr, 



(3.5) 



in which we have replaced Pq^ by its estimate P„, the empirical measure associated to the 
data. 



For a given € 0, since the supremum in p.4p is unique and it is achieved at a = 
define the following class of M-estimates of 9t 



aJ9) := arg sup <^ / (p' { — ] dPe 
aee [J \PaJ 



n^,lP^ 

Pa \Pa 



Pe_ 

Pa 



dP„ 



(3.6) 



which we call "dual (/>-divergence estimates" (Di;^DE's); (in the sequel, we sometimes write 
a instead of a(f,{9)). Further, we have 

hi{D^{9,9T)=D^{9T,9T) = Q. 

The infimum in this display is unique and it is achieved at 9 = 9t- It follows that a natural 
definition of minimum </)-divergence estimates of 9t, which we will call "minimum dual 
0-divergence estimates" (MD(^DE's), is 



9(j, := arg inf sup 

see age 



^ 1 dP, 

Po 



Pe ,/ / Pe 



Po 



Po 



Pe_ 

Pa 



dPn 



In order to simplify formulas (j3.5p . (j3.6p and (j3.7p . define the functions 



gi9,a) : x ^ g{9,a,x) := 



pe{x) , ( peix 



Pa[X) 

f{9,a) : x f{9,a,x) : 



Pa{x) 
, ( Pejx) 

Pa{x) 



Pe[x) 

Pa{x) 



and 



h{9, a) : X ^ h{9, a, x) := Pgf{9, a) — g{9, a, x). 
Hence, (13. Sp . ()3.6p and ()3.7p can be written as follows 

D^{9,9t) := sup Pnh{9, a), 
a(j,{9) := arg sup P„/i(6', q) 



(3.7) 

(3.8) 
(3.9) 

(3.10) 

(3.11) 
(3.12) 
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and 



9^ := arg inf sup Pnh{0, a) 



Formula (13.41) can be written then as 



D^{9,9t) = sup P0rj,h{9,a). 



(3.13) 



(3.14) 



If the supremum in (j3.12p is not unique, we define the estimate a(p{9) as any value of 
a G Q that maximizes the function a £ 9 Pnh{9,a). Also, if the infimum in (j3.13p is 
not unique, the estimate 0^ is defined as any value of G that minimizes the function 
9 I— > sup^ge Pnh{9, a). Conditions assuring the existences of the above estimates are given 
in section 3.1 and 3.2 below. 



Remark 3.1. For the Li distance, i.e. when (t>{x) = \x — 1\, formula ^3.4\ ) does not 
apply since the corresponding cp function is no t differentiable. However, using the gen- 
eral dual representation of divergences given in iBroniatowski and Keziou \200a ) theorem 
4.1, we can obtain an explicit formula for Li distance avoiding the differentiability as- 
sumption. A methodology on estimation and testing in Li distance has been proposed by 
Devroye and Luaosi ^200 A ), and its consequences for composite hypothesis testing and for 
model selectio n base d den sity estimates for nested , classes of densities are presented in 
Devrove et al.l 1(2002 ) and lBiau and Devrova l(2003i ). 



Remark 3.2. (An other view at the ML estimate). The maximum likelihood estimate 
belongs to both classes of estimates 1^3. 12\) and 13.13\) . Indeed, it is obtained when 4>{x) = 
— logx + X — 1, that is as the dual modified KL-divergence estimate or as the minimum 
dual modified KL-divergence estimate, i.e., MLE=D{KLm)L)E=MD[KLm)DE. Indeed, 
we then have Pgf{9,a) = and Pnh{9,a) = — /log f dPn- Hence by definitions h3. d) 



and {3. 7|), we get 



aKLmiG) = arg sup - / log ( — ) dP„ = arg sup / \og{pa) dPn = MLE 
aee J \PaJ aeeJ 

independently upon 9, and 



^KLm = arg inf sup - / log — dPn = arg sup 



P9 



P / log(pe 



dP„ = MLE. 



So, the MLE can be seen as the estimate of 9t that minimizes the estimate of the KL^ 
divergence between the parametric model {Pq] 9 G 0} and the p.m. Pqj,. 
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3.1. The asymptotic properties of the D(/)DE's q?(^(6') and Z)<^(6',0t) for a given 6 
in 0. This section deals with the asymptotic pr operties of the e stima tes (|3.1ip and (j3.12p . 



We win use similar arguments as developed in Ivan der VaartI (| 19981 ) section 5.2 and 5.6 
under classical conditions, for the study of M-estimates. In the sequel, we assume that 
condition (j3.ip holds for any a E B, and use ||.|| to denote the Euclidean norm in W^. 



3.1.1. Consistency. Consider the following conditions 
(c.l) The estimate Q0(0) exists; 

(c.2) supQ,ge \Pnh{9,a) — PQ,^h{9,a)\ converges to zero a.s. (resp. in probability); 
(c.3) for any positive e, there exists some positive r] such that for all a S satisfying 
\\a — Ot\\ > e we have 

Remark 3.3. Condition (c.l) is fulfilled for example if the function a € i— > Pnh{9,a) 
is continuous and Q is compact. Condition (c.2) is satisfied if {x i— > h{9,a,x); a £ 0} is 
a Glivenko-Cantelli class of functions. Condition (c.3) means that the maximizev oi — B'j^ 
of the function a Pg^h{9,a) is well- separated. This condition holds, for example, when 
the function a G Pg^h{9, a) is strictly concave and is convex, which is the case for 
the following two examples: 

Example 3.1. Consider the case <j){x) = — logx + x — 1 and the normal model 

{7^(q,1); q G = M}. 

Hence, we obtain 

Pe^h{e, a) = ^{9- 9Tf " ^(a " ^t? ■ (3.15) 

2 

We see that condition (c.3) is satisfied; we can choose rj = 

Example 3.2. Consider the case (j){x) = — logx + x — 1 and the exponential model 

{pa{x) = aexp(— ax); a G = M!^| . 

Hence, we obtain 

9 a 

Pe^h{9,a) = -\og9 + — +\oga-—, (3.16) 
which is strictly concave (in a). Hence, condition (c.3) is satisfied. 

Proposition 3.1. (1) Under assumption (c.1-2), the estimate D^{9,9t) converges 
a.s. (resp. in probability) to D ^{9, 9t). 
(2) Assume that the assumptions (c. 1-2-3) hold. Then the estimate aff,{9) converges 
in probability to 9x. 



12 



MICHEL BRONIATOWSKr AND AMOR KEZIOU** 



3.1.2. Asymptotic Normality. Assume that 6t is an interior point of 0, the convex function 
<j) has continuous derivatives up to 4th order, and the density has continuous partial 

derivatives up to 3th order (for all x A — a.e). Denote Iq^, the Fisher information matrix 



/ / T 



I := / l-r-^ dX. 
J P9t 

In the following theorem, we give the limit laws of the estimates a(j,{9) and D^[0, 9t)- We 
will use the following assumptions. 
(A.O) The estimate a^{9) exists and is consistent; 

(A.l) There exists a neighborhood N{9't) of 9t such that the first and second order 
partial derivatives (w.r.t a) of f{9,a,x)pg{x) are dominated on N{9t) by some 
A-integrable functions. The third order partial derivatives (w.r.t a) of h{9, a, x) 
are dominated on N{9t) by some P6»T-integrable functions; 

(A.2) The integrals Pe^\\{d/da)h{9,9T)\\^ and Pe^Wid"^ / da^)h{9 ,9t)\\ are finite, and 
the matrix PQ^{d'^ /da'^)h{9,9T) is non singular; 

(A.3) The integral P0^h{9,9Tf is finite. 

Theorem 3.2. Assume that assumptions (A. 0-1-2) hold. Then, we have 

(a) ^/n{a^{9) — 9t) converges in distribution to a centered multivariate normal ran- 
dom variable with covariance matrix 

V^{9,9t) = S-^MS-^ (3.17) 

with S := -P0^{d'^/da^)h{9,9T) and M := Pe^{d/da)h{9,9T){d/dafh{9,9T). 
If9T = 9, then V^{9,9t) = V{9t) = Ig\ 

(b) If9T = 9, then the statistic -^^D^{9, 9t) converges in distribution to a random 
variable with d degrees of freedom. 

(c) // additionally assumption (A.3) holds, then when 9 ^ 9t, we have 
/n (D(f)(9, 9t) — D(f>i9, ^t) ) converges in distribution to a centered normal random 



variable with variance 

al{9, 9t) = PerHO, Orf - {Perh{9, 9T)f . (3.18) 

Remark 3.4. Our first result (proposition 13.11 above) provides a general solution for 
the consistency of the glob al maximum ( 
difficult to be checked; see van der Vaart 



3.121) u nder strong but usual conditions, also 



(|l998l ) chapter 5. Moreover, in practice, the 
optimization in (j3.12p is handled through gradient descent algorithms, depending on some 
initial guess ao £ ©i which may provide a local maximum (not necessarily global) of 
Pnh{9, .). Hence, it is desirable to prove that in a "neighborhood" of 9t there exits a 
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maximum of Pnh{9, .) which indeed converges to 9t', this is the scope of theorem 13.31 in 
the fohowing subsection, which states that for some "good" oq (near 9t) the algorithm 
provides a consistent estimate. It is weh known that, in various classical models, the global 
maximizer of the likelihood function may not exist or be inconsistent. Typical examples 



are pr ovided in mixture models. Consider the Beta-mixture model given in [Ferguson 



(119821 ) section 3 

pe{x) = eg{x\i, 1) + (1 - e)g{xh{e),m), 

where 9 = [1/2,1], g{x\-f{9), (3{9)) is the Be(7, /3)-density and -f{9) = e5{e) and = 
(1 — 0)5(9) with S{6) — > +oo sufficientl y fast as ^ — > 1. The ML estimate converges to 1 



(a.s.) whatever the value of 6t in G; see iFergusonI (jl982l ) section 3 for the proof. However, 



if we take for example 9t = 3/4, theorem 13. 31 hereafter proves the existence and consistency 
of a sequence of local maximizers under weak assumptions which hold for this example. 
Other motivations for the results of theorem 13.31 are given in remark 13.51 below. 

3.1.3. Existence, consiste ncy and limit laws of a s equence of local maxima. We use similar 



arguments as developed in lQin and LawlessI (|1994| ) lemma 1. Assume that 6t is an interior 



point of G, the convex function has continuous derivatives up to 4th order, and the 
density Pa{x) has continuous partial derivatives up to 3th order (for all x A — a.e). In 
the following theorem, we state the existence and the consistency of a sequence of local 
maxima a^{9) and D^{9,9t)- We give also their limit laws. 

Theorem 3.3. Assume that assumptions (A.l) and (A. 2) hold. Then, we have 

(a) Let B{9T,n~^^^) := {a G G; \\a — 9t\\ < n~^/'^}. Then, as n ^ oo, with proba- 
bility one, the function a ^ Pnh{9, a) attains its maximum value at some point 
a(f,{9) in the interior of the ball B, and satisfies Pn{d / da)h{9 ,dLff,{9)) = 0. 

(b) ^/n{a^{9) — 9t) converges in distribution to a centered multivariate normal ran- 
dom variable with covariance matrix 

V^{9,9t) = S-H'IS-\ (3.19) 

(c) If9T = 9, then the statistic -^^^D^{9 , 9t) converges in distribution to a random 
variable with d degrees of freedom. 

(d) If additionally assumption (A.S) holds, then when 9 ^ 9t, we have 

\/n {^^{9, 9t) — D^{9, 9t)^ converges in distribution to a centered normal random 
variable with variance a'^{9,9T). 

Remark 3.5. The results of this theorem are motivated by the following statements 
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The estimates a(f,{6) can be calculated if the statistician disposes of some preknowledge of 
the true unknown parameter 6t- 

The hypotheses are satisfied for a large class of parametric models for which the support 
does not depend upo n 9 , such norrnal, log normal, exponential, Gamma, Beta, Weibull, ... 
etc; see for example 
The maps h(9, a) : 



van der Vaari 



-oo; for example, take 



1 1998 ) paragraph 5.43. 
h{6,a,x) and {0,a) Pgj,h{6,a) are allowed to take the value 
(x) = — logx + x — 1, and consider the model 



{Pa = aCauchy{0) + (1 - a)J\f{0, 1); a G 6} , 

with e = [0, 1] and Bt = 1/2. Then, Pg^h{e, 1) = -oo for all 9 e]0, 1[. 

■ The theorem states both existence, consistency and asymptotic normality of the estimates. 

■ The estimate a^{9) may exist and be consistent whereas ci(j,{9) does not in many cases. 

■ One interesting situation also is if the map a G B ^ Pnh{9, a) = is strictly concave 
and Q is convex; the estimates a^{9) and a^{9) are the same. 

Remark 3.6. Using theorem 1 3. ^ part (c), the estimate D^{9q,9t) can be used to perform 
statistical tests ( asymptotically of level e) of the null hypothesis TCq : 9t = 9q against the 
alternative TCi : 9x ^ 9q for a given value 9q. Since D^{9q,9t) is nonnegative and takes 
value zero only when 9t = 9q, the tests are defined through the critical region 



2n 



(3.20) 



where qd,e is the {1 — e)-quantile of the distribution with d degrees of freedom. Note that 
these tests are all consistent, since D^{9q,9t) are n-consistent estimates of D^{9q,9t) = 
under TCq, and ^/n- consistent estimate of D^(9q,9t) > under 7ii; see part (c) and (d) 
in theorem \3.2\ above. Further, the asymptotic result (d) in theorem \3.S\ above can be used 



to give approximation of the power function Bt 
then the following approximation 



I3{Bt) ■■= Per {C^{Bo,Bt)). We obtain 



n 



2n 



D(I,{Bq, Bt) 



(3.21) 



where Fj^ is the cumulative distribution function of a normal random variable with mean 
zero and variance one. An important application of this approximation is the approximate 
sample size i3. 22\) below that ensures a power (3 for a given alternative 9t i= 9q. Let uq be 



the positive root of the equation 



n 



2n 



-Qd,, 



D4Bo,Bt) 
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The required sample size is then 

n* = [no] + 1 (3.22) 
where [.] is used here to denote "integer part of". 

Remark 3.7. (An other view at the generalized likelihood ratio test and approx- 
imation of the power function through KLm -divergence). In the particular case 
of the KLm- divergence, i-^., when (f){x) = (j^oix) := — logx + x — 1, we obtain from i3. 20\) 
the critical area 



CKLM,eT) := {2nsupP„log ( ^) > qA = j 2 log ^^^PfffH^fi^^ > q,, 

which is to say that the test obtained in this case is precisely the generalized likelihood ratio 
one. The power approximation and the approximate sample size guaranteeing a power (3 
for a given alternative (for the GLRT) are given by i3.21]) and i3.22\) . respectively, where 
(j) is replaced by (pQ and by KLm- 

3.2. The asymptotic behavior of the MD^DE's. We now explore the asymptotic 
properties of the estimates 9^ and a,f,{6^) defined in (j3.13p and (j3.12p . We assume that 
condition (j3.ip holds for any a, 6 G Q. 



3.2.1. Consistency. We state consistency under the following assumptions 
(c.4) The estimates 9(f, and a^{6^) exist. 

(c.5) sup|Q 5ig0j \Pnh{9,a) — Pgj,h{6,a)\ tends to in probability; 

(a) for any positive e, there exists some positive r], such that for any a in G with 
||« — ^tII > e and for all 6* G 0, it holds PQ^h{9, a) < PQ^h{9, 9x) — rj; 

(b) there exists a neighborhood of 9t, say N{9t), such that for any positive e, 
there exists some positive r] such that for all a E N{9t) and all € © satisfying 
\\9 — 9t\\ > e, it holds Pe^h{9T,a) < Pej,h{9,a) — t]; 

(c.6) there exists some neighborhood N{9t) of 9x and a positive function H such that 
for ah a in N{9t), \\h{9T,a,x)\\ < H{x) (Pg^-a.s.) with Pq^H < oo. 

Remark 3.8. Condition (c.5) is fulfilled if ^x h{9,a)] {9, a) £ 0^} is a Glivenko- 
Cantelli class of functions. Conditions (c.5. a) and (c.5.b) mean that the saddle-point 
{9t,9t), of {9,a) £ Q X Q Pnh{9,a), is well-separated. Note that theses two conditions 
are not very restrictive, they are satisfied for example when Q is convex and the function 
(0,a) G B X 1-^ Pnh{9,a) is concave in a (for all 9) and convex in 9 (for all a), which 
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is the case for example \3.1\ and \3.2\ above, both conditions (c.5.a) and (c.5.6) are satisfied; 

2 

we can take rj = 

Proposition 3.4. Assume that conditions (c. 4-5-6) hold. Then, 

(1) supgge ||S0(0) — 0t|| tends to in probability. 

(2) The MD(p estimate 9^ converges to Oj- in probability. 

3.3. Asymptotic normality. Assume that 9t is an interior point of Q, the convex func- 
tion (p has continuous derivatives up to 4th order, and the density P0{x) has continuous 
partial derivatives up to 3th order (for all x A-a.e.). In the following theorem we sate the 
asymptotic normality of the estimates 9^ and a^{9^). We will use the following assump- 
tions 

(A. 4) The estimates 9^ and a^{9^) exist and are consistent; 

(A. 5) There exists a neighborhood N{9j') of 9t such that the first and second order 
partial derivatives (w.r.t. a and 9) of f{9,a,x)pg{x) are dominated on N{9t) x 
N(9t) by A-integrable functions. The third partial derivatives (w.r.t. a and 9) of 
h{9, a, x) are dominated on N{9t) x N{9t) by some Pgy-integrable functions; 

(A.6) The integrals Pg^ \\id/da)h{9T,9T)f , Pe^ \\{d / d9)h{9T , 9t)\\' , 

Pg^ \\{d'^/da^)h{9T,9T)\\, Per \\{d'^ /d9'^)h{9T,9T)\\ and Pg^ Wid"^ /d9da)h{9T,9T)\\ 
are finite, and the matrix Ig^. is non singular. 

Theorem 3.5. Assume that conditions (A. 4-5-6) hold. Then, both ^Jn {9^ — 9t^ and 

\fn {oi^{9^ — 9t^ converge in distribution to a centered multivariate normal random vari- 
able with covariance matrix V = Iq^. 

3.3.1. Existence, consistency and limit laws of a sequence of local minima-maxima. As- 
sume that 9t is an interior point of G, the convex function (p has continuous derivatives up 
to 4th order, and the density pg{x) has continuous partial derivatives up to 3th order (for 
all X A-a.e.). In the following theorem we sate the existence and consistency of a sequence 
of local minima-maxima 9^ and a(f){9(f,). We give also their limit laws. 

Theorem 3.6. Assume that conditions (A. 5) and (A.6) hold. 

(a) Let B := ^9 ^ 0; \\9 — 9t\\ < n^"^/^^. Then, as n ^ oo, with probability one, the 
function {9, a) ^ Pnh{9,a) attains its min-max value at some point ^^0,5^(^0)^ 

in the interior of B x B, and satisfies Pn{d/da)h (^9(f,,a(f,{9(i,)^ = and 

Pn{d/d9)h{9^,a^{9^)) =0. 
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(b) Both yjn [Qif, — Orj and ^/n yi^{9^) — Otj converge in distribution to a centered 
multivariate normal random variable with covariance matrix V = In^ . 

3.4. Composite tests by minimum (/>— divergence. Let ©o be a subset of Q. We 
assume that there exists an open set Bq C M'^^' and mappings r : ^ and s : Bq ^ M.'^ 
such that the matrices R{9) := iw'^i^^) and S{f3) := -m:s{l3) exist, with elements 



continuous, and are of rank I and [d — I), respectively, ©o = {s{(3); (3 € B^} and r{9) = 
for all € ©0- Consider the composite null hypothesis 

Hq : ere ©0 versus Hi : Or e ©\©o. (3.23) 

This is equivalent to 

Ho : OtG s{Bo) versus Hi : Ot £ ©\s(Bo)- 

Using (j3.14p . the (/)-divergence /^^(©o, Ot), between the set of distributions {Pg such that 9 £ 
and the p.m. Pq^, can be written as D^(Qq,9t) = infggeg sup„g0 a). Hence, it 

can be estimated by 

D4,{Qq,9t) := inf D'^{9,9t) := inf sup a). 

We use D^{Qq,9t) to perform statistical test pertaining to (j3.23p . Since Dfj,{Qo,9T) '■= 
infggep (9, 9t) is positive under TLi and takes value only under TLq (provided that the 
infimum is attained on ©o), we reject TIq whenever Dfp{Qo,9T) takes large values. The 
following theorem provides the limit distribution of D(j){QQ,9T) under the null hypothesis 
Ho- 

Theorem 3.7. Let us assume that the conditions in theorem \3.5\ are satisfied. Under TCq, 
the statistics -^^^D^{Qq,9t) converge in distribution to a random variable with I 
degrees of freedom. 

The following theorem gives the limit laws of the test statistics -^^^D^{Qq, 9t) under the 
alternative hypothesis Tii : 9t £ ©\©o- We will use the following assumptions. 

(C.l) The minimum of ^ ^ D^{9,9x) on ©q is attained at some point, say 9* := s{j3*) 
with (3* £ Bq] uniqueness then follows by strict convexity of (j) and model identifi- 
ability assumption; 

(C.2) There exists a neighborhood N(l3*) of /?* and a neighborhood N{9't) of 9t such that 
the first and second order partial derivatives (w.r.t. a and /?) of f{s{f3), a, x)ps[i3) {x) 
are dominated on N{(3*) x N{9t) by A-integrable functions. The third partial 
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derivatives (w.r.t. (3 and a) of /i(s(/3), a, a;) are dominated on N{(3*) x N{6t) by 
some P^T-integrable functions; 
(C.3) The integrals P,^ || (9/9a)/i(s(/3*), ^T)f, Pe^ ||(9/9/3)/i(s(/3*), 0^)11', 
Pe^ \\{d'^/da'^)h{s{l3*),eT)\\, Pe^ \\{d'^ /d[3'^)h{s{l3*),eT)\\ and 
Pot ^r)|| are finite, and the matrix 



A 



All Ai2 
A21 A22 



is non singular, wherein := Pe^id'^ /dp^)h{s{l3*),9T), A22 := Pe^id"^ /da'^)h{s{P*),eT) 
and A12 = Al^ := Pg^id"^ /dl3da)h{s{(3*),eT). 
(C.4) The integral Pq^ er)f is finite. 

Denote (3ff, and the min-max optimal solution of 

D ^{Qq, 9t) ■■= inf sup Pn/i(s(/3), a), 
/3gBo Qge 



andletB(/3*,n-i/3) — {/3 g 
and F the matrix defined by 



P := Pe, 



{d/d(3)h{s{P*),9T) 
{d/da)h{s{P*),9T) 



{d/d(3)h{s{f3*),9T) 
id/da)hisif3*),9T) 



(C.5) The estimates and a^{l3^) exist and are consistent estimators for (3* and 9t 
respectively. 

Theorem 3.8. Assume that conditions (C. 1-2-3-4-5) hold. Then, under the alternative 
hypothesis TCi, we have 

(a) ^/n{ Cfi — c*) converges in distribution to a centered multivariate normal random 
variable with covariance matrix V = A~^FA~^. 

(b) If additionally the condition (C. 6) holds, then ^/n ^DiI){Qq,9t) — D^{Qq,9t)^ con- 
verges in distribution to a centered normal random variable with variance 



al{P*,9T) = Pe^h{s{P*),9Tf - {Pe^h{s{l3*).0T)f . 



(3.24) 



Remark 3.9. Using theorem \3. 7[ the estimate 15(^(00, ^t) can be used to perform sta- 
tistical tests ( asymptotically of level e) of the null hypothesis TCo : 0t G ©o against the 
alternative TCi : 9t S ©\©o. Since D(j){Qo,9T) is nonnegative and takes value zero only 
when 9t G ©O; the tests are defined through the critical region 



2n 



C4©0,^t) := <^ -^Z)0(©o,^t) > qi 



(3.25) 
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where qi^^ is the (1 — e)-quantile of the distribution with I degrees of freedom. Note that 
these tests are all consistent, since D^[Qq,9t) are n-consistent estimates of D^{Qq,6t) = 
under TCq, and ^/n- consistent estimate of D^{Qq,9t) > under TCi; see theorem \3. 7| 
and theorem \3.8\ part (c). Further, the asymptotic result (c) in theorem \ 3.8\ above can be 
used to give an approximation to the power function 9t > Pidr) ■= Pqt iC^{@o,0T))- We 
obtain then the following approximation 



P{9t) ~ 1 - Fat 



n 



2n 



-qi,e-D^{eo,9T) 



(3.26) 



where Fj\f is the cumulative distribution function of a normal variable with mean zero and 
variance one. An important application of this approximation is the approximate sample 



size (3.21) below that ensures a power (3 for a given alternative 9t S 0\Bo. Let uq be the 



positive root of the equation 



<y<t>{P\OT) 



qi,,-D^{Qo,OT) 



2n 



^0 = ''"^2D,{iX?^ ^^^^^ " = [Fj^Hl - /?)]' andb = r{l)qiMQo,OT). 

The required sample size is then 

n* = [no] + 1 (3.27) 
where [.] is used here to denote "integer part of". 

Remark 3.10. (An other view at the generalized likelihood ratio test for com- 
posite hypotheses, and approximation of the power function through KLm- 
divergence). In the particular case of the KLm-divergence, i.e., when (j){x) = (poix) ■= 
— logx + x — 1, we obtain from \3. 25\) the critical area 

(^KL,^['d>0,9T) - < 2 log ™ — — > qi^, 

[ sup0g0jli^iP0(Ai) 

which is to say that the test obtained in this case is precisely the generalized likelihood 
ratio test associated to i3.23\} . The power approximation and the approximate sample size 
guaranteeing a power [3 for a given alternative (for the GLRT) are given by i3.26\) and 
{ 3.27\ ), respectively, where cp is replaced by (po and by KLm- 



4. NON REGULAR MODELS. A SIMPLE SOLUTION FOR THE CASE OF MIXTURE MODELS 

The test problem for the number of components of a finite mixture has been extensively 
treated when the total number of components k is equal to 2, leading to a satisfactory 
solution; the limit distribution of the generalized likelihood ratio statistic is non standard, 
since it is 0.5(5o + 0.5x^(1), a mixture of a Dirac mass at and a x^(l) with weights 
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equal to 1/2; see e.g. iTitterinpton et al\ (Il985l'l and 



the problem is much more involved. ISelf and Liang) (jl987l ) obtained the limit distribution 



Self and Liand 119871). When k > 2, 



of the generalized likelihood ratio statistic, which is non standard and complex. This 
result yields formidable numerical difficulties for the calculation of the critical value of 
the test. In section 5.1 below, we propose a unified treatment for all these cases, with 
simple and standard limit distribution both when the parameter 6t is an interior or a 
boundary point of the parameter space 0. On the other hand, confidence regions for the 
mixture parameter 6t even when k = 2 are intractable through the generalized likelihood 
ratio statistic. Indeed, the limit law of the generalized likelihood ratio statistic depends 
heavily on the fact that is a boundary or an interior point of the parameter space. For 
example, when k = 2, the limit distribution of the generalized likelihood ratio statistic is 
0.55q + 0.5x^(1) when 6 = and x^(l) when < 6 < 1. Therefore, the confidence level is 
not defined uniquely. At the opposite, we will prove in section 5.3 that the proposed dual 
X^-statistic yields quite standard confidence regions even when k > 2. 

4.1. Notations. Let li'iP; ai G • • |-fafc'*i Q^fc £ ^fc| be /c-parametric models where 
Ai,...,Ak are k {k>2) sets in M'^i, . . . and di,... ,dk G N*. Denote Pe the mixture 
model 



P,:=^u;,P« (4.1) 

i=l 

where < < 1, ^ = 1 and 

e ee:= { {wi,...,Wk,ai,...,akf G [0,1]^^' x x • • • x 



Ak such that = 1 > 

i=i ) 



(4.2) 

and assume that the model is identifiable. Let /cq G {1; ■ ■ ■ ,k — 1}. We test if {k — ko) 
components in (|4.ip have null coefficients. We assume that their labels are k^ + 1, ...,k. 
Denote Qq the subset of 6 defined by 

©0 ■= {0 £ Q such that Wk^+i = • • • = = 0} ■ 

On the basis of an i.i.d sample Xi, . . . ,Xn with distribution Pg,^, 9t G O, we intend to 
perform tests of the hypothesis 

Hq : G Oo against the alternative TLi : 9t & Q\&o- (4-3) 

It is known that the generalized likelihood ratio test, based on the statistic 

21ogA:=21ogi^^ESa^lM|I, (4.4) 
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is not valid for this problem, since the asymptotic approximation by x'^ distribution does 
not hold in this case; the problem is due to the fact that the null value of 9t is not in the 
interior of the parameter space 0. We clarify now this problem. For simplicity, consider 
a mixture of two known densities po and pi with pQ ^ pi: 

pe = {l- e)po + 9pi where 6 £ Q := [0, 1]. (4.5) 

Given data Xi, . . . , X„ with distribution Poj,, 6t G [0, 1], consider the test problem 

7io : 6t = against the alternative Tii : 6t > 0- (4-6) 

The generalized likelihood ratio statistic for this test problem is 

Wn{0) := 2log^y (4.7) 

where 1(6) := lYl=i [(1 - ^)Po(^i) + 6pi{Xi)] for all 9 G [0, 1], and 6 is the MLE of 9. 
Using the strict concavity of the function 6 £ [0,1] ^ l{9) := logL{9), it is clear that 
9 = whenever /^(O), the derivative on the right at 9 = of 9 l{9), is nonpositive. 
Hence, we can write 

Po{W„ = 0} > Po{0 = o} = Po{/^(0)<0} = Po|e^7|§-^^o} 

which, by the CLT, tends to 1/2 (if 1 / E{Y^) < oo where Yi := po{Xi)/pi{Xi)) since the 
random variables Yi are i.i.d with E{Yi) = 1 under TCq. This proves that the convergence 
in distribution of the generalized likelihood ratio statistic VF„(0) to a random variable 
(under TCq) does not hold. Under suitable regularity conditions we can prove that the limit 
distribution of the statistic Wn in 
and the Dirac measure at zero; see 



4.7D is 0.5Jn + 0.5y?, a mixture of the x^-distribution 



Self and Liand (|l987l l. 



Moreover, in the case of more than two components and k — > 2, the limit distri- 
bution of the GLR statistic (j4.4l) under TIq is complicate and not standard (not a 
distribution) which poses s ome difficulty in dete rmining the critical value that will give 



correct asymptotic size; see lSelf and LianeJ (|l987l ). On the other hand, the likelihood ratio 
statistic 

Wn{9) := (4.9) 

can not be used to construct asymptotic confidence region for the parameter 9t since its 
limit law is not the same when 9t = and 9t > 0. 
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Figure 1 . Empirical distribution of the GLR and its limit distribution 

In figure 1, we illustrate the accuracy of the approximation of distribution of the GLR 
by its limit 0.56o + O.Sxf; we plot the cumulative distribution function (c.d.f) of both the 
limit law, and the observed GLR's obtained from 1000 independent runs of samples with 
sizes n = 200, n = 500 and n = 1000, with Pq = AA(0, 1) and Pi = AA(0.5, 1). 

4.2. A simple solution to the problem of testing the number of components 
in a mixture. We propose the following simple solution : Consider the following set of 
signed finite measures 

pe := {I - e)po + Opi where e e R. (4.10) 

This set (of signed finite measures with mass one) obviously contains the mixture model 
()4.5p . In particular, the null value of Ot (i.e., 9t = 0) is an interior point of the parameter 
space R. The likelihood ratio test (for a model of signed measures) cannot be used since 
the log-likelihood l{9) may be infinite (when ^ < or ^ > 1). In the context of divergences, 
this means that the estimate KL„i{Pqi Pot) may be infinite if we consider the model ()4.10p . 
which is due to the fact that the corresponding convex function (j){x) = — log x + x — 1 
is infinite on M_. This suggests to use a divergence associated to a convex function (j) 
which is finite on all M, for instance, the x^-divergence (which is associated to the convex 
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function (j){x) = ^{x — 1)^). So, in order to perform a test asymptotically of level e for 
()4.6p . we propose to use the following estimate of the x^-divergence between Pq and Pg^, 

^(0, 0t) = sup {Po/(0, a) - P„5(0, a)} , (4.11) 

aeOe 

where /(0,a) = po/pa — 1 and ^(O, a) = l/2(po/Pa + l)(po/Pa — 1) as a consequence of 
definitions (j3.9p and (j3.8p . and Bg is the new parameter space which we define as follows 



@e ■= |a G ^ such that J 1/(0, a)| dPo is finite | . 

The value of the parameter 9t under the null hypothesis TCq, i.e., 9t = 0, is in the interior 
of the new parameter space Qe which is generally non void. Hence, under conditions of 
theorem 13.21 where Q is replaced by Qe and 6 by zero, under TCq the statistic 2nx^(0, 0^) 
converges in distribution to a random variable with one degree of freedom; the critical 
region takes then the form 

CR:=[2n^^{0,9T)>qi,e}, (4.12) 

where qi^e is the (1 — e)-quantile of the distribution with one degree of freedom. Ob- 
viously other divergences which are associated to convex functions finite on all M can be 
used. The use of the x^-divergence is recommended. Indeed, for regular cases (for ex- 
ample for multinomial goodness-of-fit te sts) y^-test is equivalent (in Pitman sense) to the 



generalized likelihood ratio one; see also lCressie and ReadI (j 19841 ) sections 3.1 and 3.2 for 
other motivations in favor of the approach. 

In figure 2, we illustrate the accuracy of the approximation of the distribution of the 
proposed dual x^-statistic by the x^(l); we plot the cumulative distribution function 
(c.d.f) of both the limit law, and the dual x^-statistic obtained from 1000 independent 
runs of samples with sizes n = 200, n = 500 and n = 1000, with Pq = AA(0, 1) and 
Pi = AA(0.5, 1). We observe that the approximation is as satisfactory as it is in figure 1 
for the GLR case, so that the extension of the model to signed finite measures does not 
affect the quality of the approximation of the limit distribution. 

4.3. Confidence regions for the mixture parameters. We propose the following 
solution to the confidence region problem when the parameter may be a boundary value 
of the parameter space: The estimate 

^(0,0t)= sup {Pef{e,a)-Png{e,a)}, (4.13) 

aG0e(6») 



where 



@eiO) := |a G IR such that j 1/(6*, a)| dPg is finite 
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Figure 2. Empirical distribution of the dual x^-statistic and its limit law 

can be used to construct asymptotic confidence region for the parameter 9t with level 
(1 — e) defined by 

C := je* E e such that 2/1x^(6*, 6't) < gi,.} • 

In fact, lim„^oo Per i^T G C) = 1 — e both when 9t = ov 6t > since the statistic 
Iux'^^OtjOt) converges in distribution to random variable with one degree of freedom 
both when 9t = or 9t > 0. We give now the form of the critical region and the confidence 
region in the multivariate case, i.e., in the case of the general model (j4.ip . For all G 0, 
define the set 

e^{9) := |a G X >li X ••• X ylfc such that ^ = 1 and J \f{9, a)\ dPg is finite | , 
and the statistic 

^^@o, Ot) := inf ^(0, 9t) := inf sup {Pgf{9, a) - Png{9, a)} . 

Under some conditions similar to that in theorems 3.1, 3.2 and 3.3, we can prove, under 
the null hypothesis TLq in ()4.3p . that the statistic 2nx^(Oo,^T) converges in distribution 
to x? random variable with {k — ko) degrees of freedom. Also, the statistic 2nx^{9,9T) 
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when 6 = 6t converges in distribution to random variable with d := k — l + di + - ■ - + (1}^ 
degrees of freedom in both case when 9t is a boundary value or not. Hence, the critical 
region is given by 

CR:= [2nx^{Qo.eT) > qk-k,,e] . 

and 

C ■.= \ee@ such that 2n^{e,eT) < gd,.} 

is an asymptotic confidence region for 6t of level e both when 9t is a boundary value or 
not. 

4.4. Approximation of the power function of the likelihood ratio statistic: sim- 
ulation results. In the context of the exponential model pe{x) = exp {9x}, we consider 
the problem of testing 

TLq : 6t = ^ versus Tii : 9t ^ ^ 
using the GLR. We recall that the power function of the GLR test is 

Ot ^ Pier) ■■= Per [2nKlZ (1, ^t) > gi.o.s} (4.14) 
and its approximation is 

/n 



f5{9T) = l-F^f 



Zn 



(4.15) 



where Fj^j is the cumulative distribution function of a normal random variable with mean 
zero and variance one, and (i){x) = — logx + x — 1; see remarks 3.3 and 3.4 above. The 
power function (j4.14p is plotted (with continuous line) for sample sizes n = 50, n = 100, 
n = 300 and n = 500, and for different values of Ot- Each power entry was obtained 
from 1000 independent runs. The approximation (j4.15p is plotted as a function of Ot by a 
dashed line. We observe (see figure 3) that the approximation is accurate for alternatives 
which are not "close to" the null hypothesis even for moderate sample sizes. 

5. Concluding remarks and possible developments 

We have addressed the parametric estimation and test problems. We have introduced 
new estimation and test procedure using divergence minimization and duality technique 
for discrete or continuous parametric models, avoiding the smoothing method. The pro- 
cedure leads to optimal estimates for the parameter model and for the divergences. It 
includes both the discrete (finite or infinite) and the continuous support cases. It extends 
the maximum likelihood method for both estimation and test problems. Moreover, the 
procedure and the divergences framework permit to obtain the limit laws of the proposed 
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Figure 3. Approximation of the power function 

estimates and the test statistics both under the null and the alternative (simple or com- 
posite) hypotheses, including the generalized likelihood ratio statistic. As a by-product, 
we obtain explicit power functions in a general case for simple or composite parametric 
test problems, and approximations of the minimal sample size which guarantees a desired 
power for a given alternative. A new test and new asymptotic confidence regions are pro- 
posed in the case where the parameter may be a boundary value of the parameter space. 
Many problems remain to be studied in the future, such as the choice of the divergence 
which leads to an "optimal" (in some sense) estimate or test in terms of efficiency and 
robustness, construction of convergent estimates and test statistics by divergence when 
the maximum likelihood is not consistent (for example for location family for which the 
expectation does not exists), the Bartlett correctability and the large deviation properties 
of the proposed statistics D^. 

6. Appendix 

Proof of proposition [STTl (1) We will prove the consistency of the estimate D^{9,dT)- 
We have 

Pnh{e,a^{e))-Pe^h{e,eT)\ ■.= \a\, 



D^{9,9t)-D^{9,9t) 
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which imphes 

Pnh{0, Ot) - PerKO, eT)<A< Pnh{e, a0)) - Pe^h{e, a^O)). 

Both the RHS and the LHS terms in the above display go to 0, under condition (c.2). 
This imphes that A tends to 0. 



(2) For the consistency of a(f,{9), we refer to Ivan der VaartI (|l998l l theorem 5.7 



Proof of theorem 13.21 (a) Using (A.l), simple calculus give 

Pe^{d/da)h{e,a) =0 (6.1) 

and 

PeAdVda'')hi9, Or) = - J {pe / PeM / plMrPeJ dX =: -S. (6.2) 

Observe that the matrix S is symmetric and positive since the second derivative (j)" is 
nonnegative by the convexity of Let UniOx) '■= Pn{d / da)h{0 , 9t) ■, and use (j6.ip and 
(A. 2) in connection with the Central Limit Theorem (CLT) to see that 

V^Unier) ^N{^,M). (6.3) 

Also, let VniOr) ■= Pnid"^ /da'^)h{e,eT), and use (fOj) and (A.2) in connection with the 
Law of Large Numbers (LLN) to conclude that 

K(^t) ^ -S (a.s). (6.4) 

Using the fact that Pn{d/da)h{9,a) = and a Taylor expansion of Pn{d/da)h{9,a) in a 
around 6t, we obtain 

= Pn{d/da)h{e,a) = Pn{d/da)h{e,eT) + {a - Brf Pn{d'^ / da^)h{e .Ot) + Op{n-^'^). 
Hence, 

^{a-OT) = -Vn{eT)-^y/^Un{eT)+Op{l). (6.5) 

Using (jO.Sp and (j6.4p and Slutsky theorem, we conclude then 

(a - ^r) ^AA (0,1/40,^7)) (6.6) 

where V^{9,6t) is given in part (a) of theorem 13. 2i When 9t = 0, direct calculus shows 
that V^{9,9T) = Ig^\ 

(b) Assume that Ot = 0. From ()6.5p . using the convergence (16. 4p . we get 

^{a- Ot) = S-^V^UniOr) + Op{l). (6.7) 
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On the other hand, a Taylor expansion of [2n/0"(l)] D^{9, Ot) = [2n/(/>"(l)] Pn{d/da)h{9, a) 
in a around 6t, using the fact that Pnh{6, 9t) = when 6t = 0, gives 

Use (16. 4j) . (16.7P and the fact that S" = —(f)"{l)Io^ when 9t = to conclude that 

Finally, use the convergence (16. 3p and the fact that M = ^"(l)^/^^ when 9 = 6t, to con- 
clude that [2n/(/)"{l)] D^{9,9t) converges in distribution to a variable with d degrees 
of freedom when 9 = 9t- 

(c) Assume that 9t 9. A Taylor expansion of D^{9,9t) = Pnh{9,a), in 3 around 
9t, using the fact that Pej.{d / da)h{9 , 9t) = 0, gives D^{9,9t) = Pnh{9,9T) + Op{n-^/^). 
Hence, 

n (d49, 9t) - D^{9, 9t)) = ^ \Pnh(9, 9t) - PorHO, Ot)] + Op(l), 



which under assumption (A. 3), by the CLT, converges in distribution to a centred normal 
variable with variance a'^{9,9T) = Pe^h{9,9TY ~ {Per^i^ ,9t))^ ■ 

Proof of theorem 13.31 (a) For any a = 9t + un^^/^ with |n| < 1, consider a Taylor 
expansion of Pnh{9, a) in a around 9t, and use (A.l) to see that 



n 



Pnh{9, a) - nPnh{9, 9t) = n^'^u^Un + + 0(1) (a.s.) 



uniformly on u with \u\ < 1. Now, use (lOl) and the fact that [/„ = O (n^^/^ (log log n)^/^) 
(a.s) to conclude that 

nPnh{9, a) - nPnh{9, 9t) = O f n^/^(log log nf^) - + 0(1) (a.s.) 



uniformly on u with |u| < 1. Hence, uniformly on the surface of the ball B (i.e., uniformly 
on u with \u\ = 1), we have 

Pnh{9, a) - nPnh{9, 9t) < O {n^l^{\og log nf^) - + 0(1) (a.s.) (6.8) 



nl 

where c is the smallest eigenvalue of the matrix 5. Note that c is positive since S is 
positive definite (it is symmetric, positive and non singular by assumption A. 2). In view 
of (j6.8p . by the continuity of a i— > Pnh{9,a) — nPnh{9,9T) and since it takes value zero 
on a = 9t and is asymptotically negative on the surface of -B, it holds that as n — > oo, 
with probability one, a i— > Pnh{9,a) attains its maximum value at some point di(j){9) in 
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the interior of the ball B, and therefore the estimate a^{6) satisfies Pn{d/da)h{9,a) = 
and a -9T = 0{n-^/^). 

The proofs of parts (b), (c) and (d) are similar to those of parts (a), (b) and (d) in theorem 
I3.2i Hence, they are omitted. 

Proof of proposition 13.41 We prove (1). For all G 0, under condition (c. 4-5-6), 
we prove that supgge ||a0(0) — Ot\\ tends to 0. By the very definition of a^{9) and the 
condition (c.5), we have 

> Pe^h{e,eT)-opii), 

where Op(l) does not depend upon 6 (due to condition (c.5)). Hence, we have for all G 

Pe^h{9, 9t) - PorKO, a^{0)) < PnHO, a^O)) - Pe^h{e, a^O)) + Op(l). (6.9) 

The RHS term is less than supj^^^gQ} |P„/i(0, a) — Pey/i(0, a)| +Op{l) which, by (c.5), 
tends to 0. Let e > be such that supg^g ||S(^(^) — 9t\\ > e. There exists some a„ G 
such that ||S(/,(a„) — 6t\\ > e. Together with (c.5. a), there exists some rj > such that 
P$,^h{an, 9t) — Perhian, 30(a„)) > r]. We then conclude that 

P \ sup \\a^{9) -9t\\ > e \ < P {Pg^h{an,9T) - Pe^/i(a„, q<^(6')) > ??} , 

and the RHS term tends to by (j6.9p . This concludes the proof of part (1). 

We prove (2). By the very definition of 9^, conditions (c.5) and (c.6) and part (1), we 

have 

Pnh{9^,a^%)) < Pnh{9T,a^{9T)) 

< Pe^h{9T,a^{9^)) - Op{l), 

from which 

Pe^h(9^, a^{9^)) - P0^h{9T, a^{9^)) < P0^h(9^, a^{9^)) - Pnh(9^, a^{9^)) + Op{l) 

< sup \Pnh{9, a) - Pg^h{9,a)\+ Op{l).{Q. 10) 

{e,ae0} 

Further, by part (1) and condition (c.5.b), for any positive e, there exists r] > such that 

p[\\9^-9t\\ >e} < P {Pe^h{9^,a^{9^)) - Pe^h{9T,a^{9^)) > 
and the RHS term, under condition (c.5), tends to by ()6.10p . This concludes the proof. 
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Proof of theorem 13. 5L Under condition (A. 5), simple calculus give 



d 



d 



92 



P9r-^h{9T,9T) = Per-^h{9T,9T) = 



(6.11) 
(6.12) 



and 



^h{9T,9T) 



^^h{9T,9T) 



P 



9t 



-^hi9T,9T) 



-^h{9T,9T) 



-Pe, 



' d_ 

da 

ct^"{lfler- 



K9t,9t) 



d_ 
89 



K9t,9t) 



(6.13) 



Denote Un{9,9T) ■= Pn{d/da)h{9,9T), Vn{9,9T) := Pn{d^ / da^)h{9 ,9t), S{9,9t) := 

T 



T ._ 



-Pe^{d'^/da^)h{9,9T) and 
5), by a Taylor expansion, we obtain 



\/nan = \fn 



' In' 



<t>"{l) dr 




9t) , («</. 







It) 



Under conditions (A.4- 



-Pn-§gh{9T, 9t) 
-Pn£hi9T,9T) 



Op(l). 



0"(1) J 

We therefore deduce, by the CLT, that, under condition (A. 6), y/nan converges in distri- 
bution to a centred normal variable with covariance matrix 



r' r' 



(6.14) 



which completes the proof of theorem 13.51 

Proof of theorem 13.61 (a) Using condition (A. 5) and (|6.11|) . we can write 

Un{9,9T) ■■= Un{9T,9T)+o{n-'/^) {a.s.) 
and 

Vn{9, 9t) := K(^T, 9t) + 0(n-i/3) ^a.s.), (6.15) 

uniformly on G B{9t, n^'^^)- On the other hand, for any a = 9t + un^^^^ with \u\ < 1, 
by a Taylor expansion using condition (A. 5), we obtain 

nPnh{9, a) - nPnh{9, 9t) = n^^^u^Un{9, 9t) + 2-'n''^u^Vn{9 , 9t)u + 0(1) (a.s.) 

uniformly on ^ G B{9T,n~^^^) and u with |n| < 1. Combining this with (j6.14p and (j6.15p 
to see that 



nPnh{9, a) - nPnh{9, 9t) = n^/^u^Un{9T, 9t) + 2-^n^/^u^Vn{9T, 9t)u + o{r?/^) {a.s.) 
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uniformly on G B{6T,n~^f^) and u with \u\ < 1. Now, from this, using the fact that 
UnieT,9T) = 0(n-i/2 (log log n)i/2) (a.s.) and K(0t,^t) = - S (Ot , Ot) + 0(1) (a.s.), we 
obtain 

nPnh{9,a)-nPnh{9,9T) = O (n^/^ (log log n)^/^) - V5(0T, ^t)^ + o(n^/^) (a.s.) 

(6.16) 

uniformly on ^ G B{9T,n^^^^) and u with \u\ < 1. Hence, uniformly on a in the surface 
of the ball B{9T,'n^^^^) (i-e., uniformly on u with \u\ = 1), we have 

nPnh{e,a)-nPnh{e,eT) < O (n^/'' (log log n)^/^^ - 2^ V"(l)cn^^^ + o(n^/^) (a.s.) (6.17) 

(uniformly on € B(9t-, n^"^/^)) where c > is the smallest eigenvalue of the matrix Iq^ = 
4>" {\)~^ S{9t, 9t)- Hence, by the continuity of the function a ^ nPnh{9, a) — nPnh{9, 6t) 
and since it takes value zero when a = 9t and is asymptotically negative with respect 
to a on the surface of B, it holds that, as n tends to 00, with probability one, the 
function a 1— > Pnh{9,a) attains it maximum value at some point a^{9) in the interior of 
B{9t, n-i/3), and this holds for all 9 G B{9t, n"^/^). Further, since ()6.16p holds uniformly 
on G B(9T,n~^^^), we conclude that 

a^{9) -9t = 0{n-^/^) {a.s.) uniformly on G B{9t, n~^/^). (6.18) 

We now prove that, as n ^ 00, with probability one, the function 9 ^ Pn{9, a^{9)) attains 
its minimum value at some point 9(f, in the interior of the ball B(9T,n~^^^). Here, afj,{9) 
is any value in the interior of B{9T,n^^^^) which maximizes a Pnh{9,a). It exists by 
the above arguments. For any 9 = 9t + vn~^/^ with |?;| < 1, by a Taylor expansion of 
nPnh{9,a^{9)) in 9 and a^{9) around 9t , and a Taylor expansion nPnh{9T,o:(j,{9T)) in 
oc<j>{9T) around 6't, using ()6.18p and (I6.11|) . we obtain 

nPnh{9, a^{9)) - nPnh{9T, a^{9T)) = n^'\^ Pn{d /d9)h{9T, 9t) + 

2"ini/V [Pnid"^ /d9'^)h{9T,9T)]v + o{n^/^) {a.s.) 

uniformly on v with < 1. Hence, from this, using the fact that 

Pn{d/d9)h{9T,9T) = O (n-i/2(iogiog^)i/2) (a.s.) and Pn{dyd9^)h{9T,9T) = </'"(l)V + 
0(1) (a.s.), we conclude that 

nPnh{9,a^{9))-nPnh{9T,a4>{9T)) = O (n^/'^iloglogn)^/^^ +2-^(1)" {l)v^Ig^vn^/^+o{n^/^) (a.s.) 

uniformly on v with \v\ < 1. Hence, uniformly on 9 in the surface of the ball B{9j', W^^^) 
(i.e., uniformly on v with \v\ = 1), we obtain 

nPnh{9,a^{9))-nPnh{9T,a^{9T)) > O (n^/^{loglogn)^/^'^+2'^(P"{l)cn^/''^+o{n^/''^) (a.s.) 
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where c > is the smallest eigenvalue of Iqj. ■ This implies that 

r?I^Pnh{e,a^{e))-r?I^Pnh{eT,a^{eT)) > O (n-i/6(loglogn)V2^+2-V"(l)c+o(l) {a.s.) 

uniformly on 6 in the surface of the ball B(9j',n~^^^). The left hand side of the above 
display equals zero when = 6t and is positive when 6 is in the surface of the ball 
B{9'T,n~^/^) (for n sufficiently large). This implies that, as n — > oo, with probability one, 
the function 9 Pnh{9, a(f,{9)) attains its minimum value at some point 0^ in the interior 
of the ball B. This concludes the proof of part (a), 
(b) See the proof of theorem 13. 51 

Proof of theorem 13.71 We have 



D^{Qq,9t) ■■= inf sup Pnh{s (13), a) 

l3eBo age 



= Pnhf^s{(3),a^ 

in which as in the proof of theorem [331 s{f3) and a are solutions of the system of equations 

Pnii^h{s0),a] 

'da' 




0. 



In the first equation the partial derivative is intended w.r.t. the first variable /? in s(/J) and 

d_ 

'■dp 



in the second one w.r.t. the second variable a. A Taylor expansion of P^-^h (s(/3),a 



and Pn-^h ( s(/3),a) in a neighborhood of {I3t,9t) gives 



-Pn^h{s[PT),OT) 
-Pni^Ks{l3T),9T) 



bn + Op(l), 

(6.19) 



where bn '■= ^(/3 — Pt)"^ , (S — 9t)^^ ■ This implies that bn = Op{n ^/^). So, by a Taylor 
expansion of Da){Qq,9t) around {Pt,Ot), we obtain 



2n 



^J^fn = U^A-^Un - V^B-^Vn + Op(l), 



(6.20) 



where 



1 



A 



P 



92 



0"(l)^^^5a2 



■/i(s(/?t),< 



B :-- 



1 



P 



his{PT),0T). 
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By (|6T2]) . it holds A = le^. On the other hand, 



d 



ds{l3) 



d 



hisiPT),OT). 



Moreover, using the fact that (/>'(!) = 0, we can see that ^f^-^ h (s(/3r), 6t) = —-^h{s{PT),OT), 
which implies 



Pe,^h{sm,eT) = [Sm] 



In the same way, we obtain 

a2 



hisiPT),0T) = [SiPT)y 



T 



-P 



-p,^^h{sm.eT) 



6»T 



h{siPT),OT) 



[Sim ■ 



It follows that Vn = [S{PT)]^Un and B = [S (Pt)]^ hr S (Pt) ■ Combining this result with 
([OO]) . we get 



2n 



Un + Op(l) 



which is precisely the asymptotic expression for the Wilks likelihood ratio statistic for 
composite hypotheses. The proof is completed following theref o re th e same arguments as 
for the Wilks likelihood ratio statistic; see e.g. Sen and Singeil ( 1993 ) chapter 5. 



Proof of theorem 13.81 The proofs of part (a) and (b) are similar to the proofs of part 
(a) and (b) of theorem 13.71 hence they are omitted. 



(c) Using ()3.4p and (|3.14n . we can see that Z)0(Oo,6't) can be written as 
D^{eo,9T) := inf D^{s{(3),eT) = D^{s{/3*),0T) 

= SUpPg^hisiP*),a) = Pg^hisiP*),eT). 
oGO 



(6.21) 



On the other hand, by a Taylor expansion of D^{Qq,6t) = Pnh{s{f3),a^{f3)) in /? and 
a^{P) around P* and 6t, we obtain 

D^iBo, 9t) = PnHs{p*),eT) + Op(n-i/2). 

Combining this with (j6.2ip to conclude that 

D4eo,eT)-D^{eo,eT)\ = V^[P„/i(s(/3*),^t) - Pe^/i(s(r),^r)] +Op(l) 



which, by the CLT, converges to a centred normal variable with variance 



al{P*,eT) = Pe^h{s{P*),eTf - {Pe^h{s{P*),BT)f , 
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This ends the proof. 
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